Topic Segmentation: Application of Mathematical Morphology to Textual Data

نویسندگان

  • Sébastien Lefèvre
  • Vincent Claveau
چکیده

Mathematical Morphology (MM) offers a generic theoretical framework for data processing and analysis. Nevertheless, it remains essentially used in the context of image analysis and processing, and the attempts to use MM on other kinds of data are still quite rare. We believe MM can provide relevant solutions for data analysis and processing in a far broader range of application fields. To illustrate, we focus here on textual data and we show how morphological operators (here the morphological segmentation using watershed transform) may be applied on these data. We thus provide an original MM-based solution to the thematic segmentation problem, which is a typical problem in the fields of natural language processing and information retrieval (IR). More precisely, we consider here TV broadcasts through their transcription obtained by automatic speech recognition. To perform topic segmentation, we compute the similarity between successive segments using a technique called vectorization which has recently been introduced in the IR field. We then apply a gradient operator to build a topographic surface to be segmented using the watershed transform. This new topic segmentation technique is evaluated on two corpora of TV broadcasts on which it outperforms other existing approaches. Despite using very common morphological operators (i.e., the standard Watershed Transform), we thus show the potential interest of MM to be applied on non-image data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Topic Segmentation in Audiovisual Information Retrieval

Segmentation into topically coherent segments is one of the crucial points in information retrieval (IR). Suitable segmentation may improve the results of IR system and help users to find relevant passages faster. Segmentation is especially important in audiovisual recordings, in which the navigation is difficult. We present several methods used for topic segmentation, based on textual, audio a...

متن کامل

Topic Segmentation of TV-Streams by Mathematical Morphology and Vectorization

A fine-grained segmentation of Radio or TV broadcasts is an essential step for most multimedia processings. Applying segmentation algorithms to the speech transcripts seems straightforward. Yet, most of these algorithms are not suited when dealing with short segments or noisy data. In this paper, we propose a new segmentation technique inspired from the image segmentation field and relying on a...

متن کامل

Mathematical Morphology based gray scale Image Segmentation using improved watershed transform

-Mathematical Morphology provides systematic approach to analyze geometric Characteristic of signal or images, has been applied to many application such as Edge Detection, Object segmentation, noise suppression. Image segmentation is one of the most important categories of image processing. The watersheds transformation for image segmentation using mathematical morphology is widely used. When w...

متن کامل

Review of Application of Mathematical Morphology in Crop Disease Recognition

Mathematical morphology is a non-linear image processing method with twodimensional convolution operation, including binary morphology, gray-level morphology and color morphology. Erosion, dilation, opening operation and closing operation are the basis of mathematical morphology. Mathematical morphology can be used for edge detection, image segmentation, noise elimination, feature extraction an...

متن کامل

Segmentation thématique : apport de la vectorisation

This paper deals with topic segmentation of TV broadcasts using their transcription obtained by automatic speech recognition. Topic segmentation has been studied for several years, and most often the techniques proposed rely on information retrieval techniques to compute similarities between segments. In this paper, we propose a new segmentation approach inspired by mathematical morphology stud...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011